<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <title>COMP332 RISC Notes</title>
  </head>

  <body>
    <h1>Macquarie University, Department of Computing</h1>
    <h2>COMP332 Compiler Lecture Notes: RISC Architecture</h2>

    <p>The Obr compiler generates code to run on a simple RISC
    architecture that was originally described by Niclaus Wirth in his
    book "Theory and Techniques of Compiler Constuction" (which may be
    downloaded from
    <a href="http://www-old.oberon.ethz.ch/WirthPubl/CBEAll.pdf">http://www-old.oberon.ethz.ch/WirthPubl/CBEAll.pdf</a>).
    This, in turn, was designed to be close in spirit the MIPS
    architecture which was originally designed by John L. Hennessey of
    Stamford University.</p>

    <p>These notes provide a brief overview of RISC architecture sufficient for
    understanding the code generated by the Obr compiler used in
    COMP332 exercises and assignments.  Further detailed information about the RISC
    instruction set and machine execution may be gleaned by browsing the
    files <code>RISC.scala</code> and <code>RISCISA.scala</code> in this code
    bundle.</p>

    <h3>Memory and data types</h3>

    <p>RISC is a 32-bit architecture in which all general purpose registers,
    memory addresses and machine instructions are 32 bits wide. In
    the subset provided by our emulator code RISC instructions support
    the direct manipulation of 32-bit integers stored in twos-complement
    form. Furthermore, it supports the direct addressing of a memory space
    of upto 2^32 bytes. While addresses in the RISC architecture refer to
    individual bytes, all 32-bit values can only be directly accessed if
    they are aligned on 4 byte boundaries.</p>

    <h3>Registers</h3>

    The RISC architecture provides access to 32
    general purpose 32-bit integer registers named $0-$31.
    While most of these registers may be used for any task,
    some of them are allocated to special functions:<p>

    <ul>
        <li><p><b>$0</b> this is generally constrained to always
            contain the value 0.</p></li>
        <li><p><b>$28</b> (PC) this is the <em>program counter</em>,
            which contains the address of the instruction that is
            currently being executed.</p></li>
        <li><p><b>$29</b> (FP) this is the <em>frame pointer</em>,
            which points to the first byte after the top of the
            <em>frame</em> (or <em>activation record</em>) associated
            with the current procedure activation.</p></li>
        <li><p><b>$30</b> (SP) this is the <em>stack pointer</em>,
            which points to the first byte of the frame associated
            with the current procedure activation.</p></li>
        <li><p><b>$31</b> (LNK) this is the <em>link pointer</em>,
            in which return addresses are stored during subroutine
            calls (using the <code>bsr</code> instruction).</p></li>

    </ul>

    Note: the frame associated with a procedure activation is a block
    of memory allocated on the machine stack in which the procedure
    stores its local variables, temporaries, return address and so forth.<p>

    <h3>Instruction set</h3>

    For our purposes the RISC architecture has instructions that can
    be categorised as follows:
    <ul>
      <li>load/store (memory access)
      <li>integer arithmetic and bitwise operations
      <li>control transfer
      <li>terminal IO
    </ul><p>

    RISC instructions themselves come in four different formats, each of
    which packs the instruction and information about its operands into
    a single 32-bit word. These formats are:
    <ul>
      <li><b>type 0</b>: <code>inst reg1, reg2, reg3</code> an instruction with
          three operands all of which are registers.
      <li><b>type 1</b>: <code>inst reg1, reg2, im</code> an instruction with
          three operands two of which are registers and one of which is an
          immediate (fixed constant) 16-bit value (mostly arithmetic and
          bitwise operations).
      <li><b>type 2</b>: <code>inst reg1, reg2, disp</code> similar to type 2
          instructions, but generally used for branching, load/store and terminal
          IO operations (the fixed constant here is again 16 bits and is called
          <code>disp</code> for <em>displacement</em>).
      <li><b>type 3</b>: <code>inst disp</code> an instruction with a single
          integer operand which is an immediate (fixed constant) displacement
          represented as a 26-bit value (generally used for branching instructions).
    </ul>
    Notice here that the constant values contained in instructions cannot be
    32-bit values, since all of the information for each complete instruction (its
    identity, register operand information and any constant values) must be packed
    into a single 32-bit word. In most cases these 16- and 26-bit constants are
    <em>sign extended</em> to 32-bits, although when given as operands to bitwise
    operations they are simply converted to 32-bit values by padding them on the
    left (high bits) with 0s.<p>

    <em>Load/store instructions</em> are the only instructions that
    access memory.  The <code>ldw</code> and <code>stw</code>
    instructions load and store word/single quantities from/to integer
    registers.

<pre>
ldw reg1, reg2, disp   ! Load word from address reg2 + disp into reg1.
stw reg1, reg2, disp   ! Store contents of reg1 to address reg2 + disp.
</pre>

    The RISC architecture also provides byte load/store <code>ldb</code>
    and <code>stb</code> and push/pop instructions <code>push</code>
    and <code>pop</code> (which maintain a stack discipline by appropriately
    updating the address stored in <code>reg2</code>). However, as things
    stand the Obr compiler does not make use of these instructions.<p>

    Here are some loading and storing examples:
<pre>
ldw $2, $1, 0      ! Load from the address stored in $1 into register $2
ldw $2, $1, 8      ! Load the word from an 8 byte offset from the
                   ! address stored in $1 into register $2.
stw $31, $0, 32    ! Store the value held in the link pointer (which is
                   ! probably a return address) in memory at absolute
                   ! address 32 (because register $0 always contains 0).
</pre>

    Most <em>arithmetic</em> operations come in type 0 (operating on three registers) and type 1 (operating on two registers and a constant) versions. In each case the first operand specifies the <em>destination register</em> in which a calculated result will be placed and the last two operands specify the <em>sources</em> of the data upon which the operation will act. Notice here that there is noting to prevent an instruction from writing its result back into one of the registers it is using as a source. <p>

    Here is a complete list of the 32-bit arithmetic and bitwise operations provided by our version of the RISC architecture:
<pre>
add dreg, reg1, reg2   ! add values in registers reg1 and reg2 and place result in dreg
addi dreg, reg1, im    ! add value in register reg1 to constant value im and place result in dreg
sub dreg, reg1, reg2   ! subtract value in register reg2 from that in reg1 and place result in dreg
subi dreg, reg1, im    ! subtract constant value im from value in register reg1 and place result in dreg
mul dreg, reg1, reg2   ! multiply values in registers reg1 and reg2 and place result in dreg
muli dreg, reg1, im    ! multiply value in register reg1 to constant value im and place result in dreg
div dreg, reg1, reg2   ! divide value in register reg1 by that in reg2 and place result in dreg
divi dreg, reg1, im    ! divide value in register reg1 by constant value im and place result in dreg
mod dreg, reg1, reg2   ! take remainder of division of value in register reg1 by that in reg2 and place result in dreg
modi dreg, reg1, im    ! take remainder of division of value in register reg1 byconstant value im and place result in dreg
and dreg, reg1, reg2   ! bitwise and values in registers reg1 and reg2 and place result in dreg
andi dreg, reg1, im    ! bitwise and value in register reg1 with constant value im and place result in dreg
or dreg, reg1, reg2    ! bitwise or values in registers reg1 and reg2 and place result in dreg
ori dreg, reg1, im     ! bitwise or value in register reg1 with constant value im and place result in dreg
xor dreg, reg1, reg2   ! bitwise exclusive or values in registers reg1 and reg2 and place result in dreg
xori dreg, reg1, im    ! bitwise exclusive or value in register reg1 with constant value im and place result in dreg
</pre>

   This architecture also combines the common operations of bit shifting, negation and the copying of values from one register to another into a single family of "move" instructions:

<pre>
mov dreg reg1 reg2     ! shift the value in reg2 the number of bits specified by the value
                       ! in reg1 and put the result in dreg
movi dreg reg1 im      ! shift the constant value im the number of bits specified by the value
                       ! in reg1 and put the result in dreg
mvn dreg reg1 reg2     ! shift the value in reg2 the number of bits specified by the value
                       ! in reg1, negate that shifted number and put the result in dreg
mvni dreg reg1 im      ! shift the constant value im the number of bits specified by the value
                       ! in reg2, negate that shifted number and put the result in dreg
</pre>

   Note: in each of these instructions, if register <code>reg1</code> contains the positive number n then the bit shifter shifts the value in the other source operand n bits to the <em>left</em>. On the other hand, if register <code>reg1</code> contains the negative number -n then the bit shifter shifts the value in the other source operand n bits to the <em>right</em>.<p>

   In the Obr compiler we mainly use the move instructions to simply copy numbers from one register to another
<pre>
mov $1, $0, $2         ! copies the value from register $2 into register $1 without shifting
                       ! (because register $0 always contains 0)
</pre>
    or to negate the value stored in a register:
<pre>
mvn $12, $0, $12       ! negate the value stored in register $12
</pre>

    However, we also combine it with the <code>or</code> instruction in order to load a 32-bit constant values into a register. We are required to do this because the 3rd (constant) operand of type 1 and type 2 instructions are only 16-bit numbers. So if we wish to load a 32-bit constant we must first split it into two 16-bit pieces and load these separately into the high (using the <code>mov</code> instruction) and low (using the <code>or</code> instruction) 16-bit segments of the destination register. For example, if we have a 32-bit constant <code>int</code> we divide it into two 16-bit portions <code>int_hi</code> and <code>int_lo</code> and load it into register <code>$10</code> using the following code:
<pre>
movi $10, $0, 16       ! Load the constant 16 into register $10 - this constant
                       ! specifies the number of bits to shift int_hi as it is loaded
movi $10, $10, int_hi  ! Shift int_hi 16 bits to the left and load the result into register $10.
                       ! In other words, this loads int_hi into the high 16 bits of register $10.
ori $10, $10, int_lo   ! Bitwise or int_lo onto the low 16-bits of the contents of register $10.
</pre>



    <em>Control transfer</em> instructions alter the program counter.
    They are used to implement source-level control statements and
    procedure calls. All branch instructions are type 3 instructions,
    taking a single 26-bit operand which is interpreted as a relative
    instruction offset. So, for example. the instruction
    <code>br -10</code> will cause execution to jump to the instruction
    10 instructions prior to the current one. In RISC assembly code
    places that we wish to jump to are marked by symbolic labels and
    our "assembler" automatically turns branches to these labels into
    appropriate offset values.
    <p>

    The <code>br</code> (branch) instruction is an
    unconditional transfer of control.

<pre>
br label1              ! Transfer control to label label1.
</pre>

    Subroutine / procedure calling is implemented <code>bsr</code> instruction,
    which saves the value of the program counter + 1 into the link pointer
    (register <code>$31</code>) before the actual branch action is performed.
    Typically, on entry to a procedure the contents of the link pointer is
    written into the frame on the call stack of this new procedure activation.
    On the other hand the <code>ret</code> instruction takes a single register
    operand and makes a jump to the address whose offset from the current
    instruction is given by the content of that register. So we may effect a
    procedure return by loading the stored return address from the frame of the
    current procedure activation into the link pointer and executing the instruction
    <code>ret $31</code>. <p>

    As a special case, if the register referred to by a <code>ret</code>
    instruction contains the value 0 then execution of that instruction will cause
    the machine to halt completely. So the following instruction is used by the
    Obr compiler to halt the execution of a program:

<pre>
ret $0                ! halts the machine, since register $0 is guaranteed to contain 0
</pre>

    Conditional transfers are implemented with the aid of a set of two
    <em>condition codes</em> called <code>Z</code> for zero and <code>N</code> for
    negative. These are set by the following instructions:

<pre>
cmp reg1, reg2     ! Subtract the value of reg2 from that of reg1
                   !    if the result is 0 then set Z to 1 else set it to 0
                   !    if the result is negative then set N to 1 else set it to 0
cmp reg1, im       ! Similar to cmp, but subtracts the constant value im from that in
                   ! register reg1
</pre>

    Once the condition codes have been set, branch instructions can be
    used to transfer control if particular condition code bits are
    set.  These conditional branch instructions are:

<pre>
beq label          ! branch if Z = 1
bne label          ! branch if Z = 0
blt label          ! branch if N = 1
bge label          ! branch if N = 0
ble label          ! branch if N = 1 or Z = 1
bgt label          ! branch if N = 0 and Z = 0
</pre>

    <p><em>Terminal IO</em> instructions are a slightly unusual feature of this architecture. Usually interations with hardware external to a processor would be achieved via some kind of <em>memory mapped IO</em> mechanism, where registers of the IO hardware would be mapped to memory addresses which may then be written to and read from by the processor. However, for simplicity's sake the RISC architecture provides some higher level instructions for writing and reading integers to and from the terminal: </p>

<pre>
rd reg             ! read a signed 32-bit integer in decimal format from
                   ! the terminal and store it in register reg.
wrd reg            ! write the contents of register reg to the terminal
                   ! as a signed decimal number.
wrh reg            ! write the contents of register reg to the terminal
                   ! as an (unsigned) hexadecimal number.
wrl                ! write a newline character to the terminal.
</pre>

    <h3>Assembler syntax</h3>

    The RISC assembler syntax is fairly conventional.  Instruction
    syntaxes are illustrated by the preceding examples.  Comments
    extend from an exclamation mark to the end of line. Labels are
    declared by following them by a colon, as in:

<pre>
label2:
    add $1, $1, $1            ! double value held in register $1
    cmpi $1, 256              ! compare value in register $1 with constant 256
    ble label2                ! branch back if comparison showed that value in $1 is &lt;= 256
</pre>

As an example, when the Obr compiler compiles the Obr program
<pre>
PROGRAM X (x : INTEGER) : INTEGER;
VAR
    i : INTEGER;
    sum : INTEGER;
BEGIN
    sum := 0;
    FOR i := 1 TO 10 DO
        sum := sum + i;
    END
    RETURN sum;
END X.
</pre>
it outputs the following RISC assembly code
<pre>
    ! Prologue
    movi $27, $0, 0
    ! StW(Local(0),Read())
    rd $1
    stw $1, $27, 0
    ! StW(Local(8),IntDatum(0))
    movi $1, $0, 0
    stw $1, $27, 8
    ! StW(Local(4),IntDatum(1))
    movi $1, $0, 1
    stw $1, $27, 4
    ! StW(Local(20),IntDatum(10))
    movi $2, $0, 10
    stw $2, $27, 20
    ! Bne(CmpgtW(LdW(Local(4)),LdW(Local(20))),Label(3))
    ldw $1, $27, 4
    ldw $2, $27, 20
    cmp $1, $2
    movi $1, $0, 1
    bgt label6
    movi $1, $0, 0
label6:
    cmpi $1, 0
    bne label3
    ! Jmp(Label(2))
    br label2
    ! LabelDef(Label(4))
label4:
    ! StW(Local(4),AddW(LdW(Local(4)),IntDatum(1)))
    ldw $1, $27, 4
    movi $2, $0, 1
    add $1, $1, $2
    stw $1, $27, 4
    ! LabelDef(Label(2))
label2:
    ! StW(Local(8),AddW(LdW(Local(8)),LdW(Local(4))))
    ldw $1, $27, 8
    ldw $2, $27, 4
    add $1, $1, $2
    stw $1, $27, 8
    ! Bne(CmpltW(LdW(Local(4)),LdW(Local(20))),Label(4))
    ldw $1, $27, 4
    ldw $2, $27, 20
    cmp $1, $2
    movi $1, $0, 1
    blt label7
    movi $1, $0, 0
label7:
    cmpi $1, 0
    bne label4
    ! LabelDef(Label(3))
label3:
    ! Write(LdW(Local(8)))
    ldw $1, $27, 8
    wrd $1
    wrl
    ! Ret()
    br label5
    ! Epilogue
label5:
    ret $0
</pre>


    <hr>
<ADDRESS><A href="mailto:asloane@ics.mq.edu.au">Tony Sloane</A> and <A
  href="mailto:domv@ics.mq.edu.au">Dominic Verity</A></ADDRESS>
  <!-- Created: Thu Jul  9 11:51:06 EST 1998 -->
<!-- hhmts start -->Last Modified: Tue Oct 05 17:00:54 +1100 2010<!-- hhmts end --><BR>
  <A href="http://www.mq.edu.au/legalstuff.html">Copyright (c) 2010-2014 by Macquarie
University. All rights reserved.</A><BR></BODY></HTML>

